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Abstract — Convergence of Extremum Seeking (ES) algo- 
rithms has been established in the limit of small gains. Using 
averaging theory and contraction analysis, we propose a frame- 
work for computing explicit bounds on the departure of the ES 
scheme from its ideal dominant-order average dynamics. The 
bomids remain valid for possibly large gains. They allow us 
to establish stability and estimate convergence rates, and they 
open the way to selecting "optimal" finite gains for the ES 
scheme. 

I. Introduction 

Extremum Seeking (ES) is a special class of algorithms 

designed to optimize dynamic systems [1]. Typically, tradi- 
tional unconstrained optimization algorithms are designed to 
find the maximum of a map ^ : R" ^ R through a sequence 
of evaluations h{x„). In the kind of problem addressed by ES, 
the map parameters are continuous functions of time x{t) G R 
and the map output is continuously measured, maybe through 
a dynamic system h{[z\,x) where [z] represents the possible 
internal dynamics [2]. 

Extremum Seeking can be traced back to [3] and was an 
active field until the third quarter of the 20th century [4]. 
While dormant during the following decades, it gained much 
attention since the early 2000's, in part after the theoretical 
progresses in [5], [6], where the first proof of stability for the 
ES scheme is provided. ES has known a rapid development 
in the last decade and its range of applications is expanding 
rapidly with the generalization of large scale and low cost 
autonomous dynamic systems and robots [7], [8]. 

Design and stability analyses of ES schemes have been 
presented for several classes of systems that follow a similar 
structure [9]: 

• a particular gradient related method is to be mimicked. 

The gradient and perhaps the Hessian [10] of the system 
at its operating point are estimated by introducing 
oscillatory perturbations to the input and measuring the 
correlation with the evolution of the output and are used 
to define a direction of search, 

• The loop is closed by setting operating point x to drift 
on a slow timescale along the search direction. 

• By design, the slow time dynamics of x is a minimiza- 
tion direction so that the slow time equation typically 
reduces to d,x = k'Vh{x} or kH^^Vh (the over-line 
represents the time average over a dither period). 

• From there, averaging theory and singular perturbation 
are invoked and it is shown that if the gains used in the 
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design of the ES scheme are small enough, x converges 
to a limit cycle in neighborhood of the optimal point x* 
[6]. 



By doing so, only the stability in the limit of infinitely 
small gains is proven. The speed of convergence however is 
driven by the amplitude of those small gains. For practical 
applications it is therefore important to know how large they 
can be while convergence is maintained, search precision 
is quantified and a satisfactory search speed is provided. 
Although is it important to mention that some basic work 
and scaling remarks have been done in this direction [6], 
[9], [11]-[13], no explicit solution has been provided yet, let 
alone for a nonlinear analysis. 

The objective of this paper is to show that those hmitations 
can be overcome if two quantitative tools -averaging theory 
up to higher orders and singular serturbation theory revisited 
by contraction theory [14]-[16]- are used. The basic idea is 
to bound or estimate the departure of the real system from the 
ideal, designed system, and to build an auxiliary optimization 
problem called meta-optimization that will select the small 
gains so that the errors remain below a fixed error. Just 
as the small gain stability has been proven for specific 
algorithms (as opposed to the whole class of ES algorithms), 
the same goes for the finite gain theory. In part 2 we 
summarize the averaging theory as presented in Chapter 3 
of [17] and discuss its application to finite order averaging 
before stating the results from the contraction analysis of 
singularly perturbed systems [16] and showing how they 
can be combined for the analysis of ES schemes. In part 3 
we apply those quantitative tools to the one dimension map 
maximization, first in its most simple form as in section 4.1 
in [6], and then in its high and low pass filtered version 
[5]. We then extend the first case to the optimization of a 
black box dynamic system. We finish part 3 by illustrating on 
the simple 2-dimensional case how higher order averaging 
can be used as a qualitative design tool to avoid high order, 
undesired terms. 

We conclude this introduction by mentioning that the 
analysis presented here for the sake of simplicity on 1- 
D objective maps extends straightforwardly to multidimen- 
sional cases, should apply seamlessly to applications such as 
the non-holonomic search [18] and constitute an interesting 
direction of research in the more general setup of stochastic 
ES [19]. 



II. Theory 

A. Averaging 

1) Fundamental relation of averaging: Let /, w be 
smooth vector fields in R". We note 2>„h — Vh-w and 
J^„h = &„h — &hw. The results from averaging theory ( [17] 
Chapter 3 and [20], Annex C) that are used in the present 
paper rely on the following relations: consider the dynamic 
system 

i = /(x) (1) 



X = e®»7 



and the new variable y defined by the implicit transformation 

(2) 

The dynamics of y given by equation y = goo{y) with 

g^{y)^e^^'f{y) (3) 
where the exponential of operator has its usual definition: 



2! 3! 



+ . 



2! 3! 
Simply put, this theorem means that given a variable change 
X = Uoo{y) — e^"'y (the introduction of the generator function 
w can be thought of as an indirect way to define Uoo), the 
dynamics for the new variable takes the simple form given 
by equation ([3]). 

2) Interpretation when f and w are written as series: 
If f{x) = Y4L1 £'fi{x) and w{y) = £'wi{y), where e is a 
given scalar, the previous relations must hold for all orders in 
e. Collecting the terms of same order together, the previous 
expressions become 

= =^y + Y,£'ui{y) 

i=l 



and 



1=1 



The M,'s are sums and products of w^^'s up to order ; and 
their derivatives up to order / — 1 while the g,'s are sums 
and products of f/s up to order /, w/s up to order /— 1 and 
their derivatives up to order / — 1 . In the present paper, we 
don't venture deeper into the properties of the g,'s and w,'s. 
At this point it is important to mention that, although their 
expressions are non trivial (especially in the case of non- 
autonomous systems, which are of our interest), they can be 
computed algorithmically at each order 

3) Application to a non autonomous system: Let's con- 
sider the system 

x = f{x,t) (4) 
and augment it with x„+ 1 = t into its autonomous form: 



X = 



X = F{X) = 



1 



The relations of the previous section still hold, and the 
differential operators IL and D of the n + 1 dimensional. 



spatiotemporal system can be rewritten as functions of their 
n dimensional, spatial counterparts. Assuming W ~ [w^ 0]^ 
we get 





P>1 











(Lw) 



n>2 



Finally, if we note ^ the modified non autonomous. Lie 
bracket such that if = if„(-) - and = ^^P-\ p > 
2, the non autonomous system can be transformed by 

x = e^'y^U^{y,t)=y + dU„{y,t) 



mto 



with 



fiy,t) 



(5) 



4) f and w as series expansions, take 2: Let's assume 
that f{x,t) and w{x,t) can be written as series in e, as 
in part |II-A.2 It is noticeable that, in the right hand side 
of equation (|5j, (9,w, appears first at order / in the Taylor 
expansion while w, doesn't appear until order e'+'. At each 
order in e in equation (jSj), we therefore have: 

gi{y,t) = ~d,Wi+Ei{y,t) 

where Ei{y,t) can be expressed as sums, products and 
derivatives of fj, j < i, wj, j <i~l and their derivatives. In 
other words, by choosing 



(6) 



w,{y,t)^ / Ei{y,t)dt + Ki{y) 



gi is independent of time at each order. At this point, it is 
important to note that if / is T— periodic (as will be assumed 
in the following), so are the Efs, w,'s and, as a consequence, 
f/oo. Also, if the /r,'s are chosen to be 0, y and x coincide 
at f = 0[r]. A probably more appropriate choice, used in the 
present article, is to set Ki so that m, = for / > 1. It leads 
to ^ = y. 

From this, we can conclude that there exists an algo- 
rithmic way to transform the non autonomous system x = 
Li>i iiito an autonomous y = Y.i>\£^Si{y) at each 

order in e by choosing an appropriate change of variable. 
That change of variables is periodic in time if / is. 

5) Finite order averaging: The previous sections provide 
an algorithmic way to build a change of variables that 
transforms the non autonomous, periodically driven, dynamic 
system into an autonomous one. For practical applications 
however, it is important to note that the variable change can 
only be performed up to a finite order n. Consider that the 
previous algorithm has been carried out up to order n so that 
Wi i < n have been defined. We note w the truncation of w. 
Then, e^"'y can be computed and truncated into Un{y,t) = 
U{y,t) that defines the relationship between y and x. This 



change of variables is then used to compute the dynamics of 
y. Derivating 

x = U(y,t) 

we get 

fiUiy,t),t)^dyUy + d,U 
Assuming that U remains invertible at all times: 

y^[dyU]-\fiU{y,t),t)^d,U) (7) 

By construction, the dynamics of y is autonomous up to order 
n. The above equation can therefore be rewritten as 

y = 8{y)+Rsiy,t) 

where g is the truncation of gco and Rg{y,t) ~ 0(e"+') arises 
from the fact that the averaging is only carried up to order 
n. 

B. Singular Perturbation with Contraction 

1) Contraction theory [14]: Consider the system i — 
f{x,t). It is said to be contracting if all trajectories converge 
exponentially towards each other A necessary condition for 
contraction is that there exists a metric &{x,t) such that 0^0 
is uniformly positive definite and j3 > such that 

F = && ^ + 0V/0- ^ 

j5 is called the contraction rate. We also define j as a 
bound on the condition number of 0. A useful lemma from 
contraction theory is the robustness lemma 

Lemma 1 ;// is contracting with rate )3 and R is a bounded 
perturbation such that y = f{y) +R(t), y(t) converges to a 
\R\/k neighborhood of x{t) where K = ji/X- soy that the 
system is K-robust. 

2) Singular perturbation [16]: Consider the dynamic 
system 

vz = g{x,z) 
x = f{x,z,t) 



Lemma 2 Assutne that the fast system vz ~ g{xQ(t),z) is 
partially X /v-robust with respect to z (the x — z coupling 
has been replaced by an external forcing). Write 7(x) 
its equilibrium, assume that there exists d > such that 
\dxY(x)f{x,z,t)\ < d. Assume also that f is Lipschitz in z with 
constant a. Assume eventually that z{t = 0) = jixlt — 0)) and 
let Xy be the solution of the reduced singular perturbation 
system iy = /(xy, 7(xy),f) (y is frozen to its equilibrium 
state). Then, 

, , dav 

\X-Xy\ < ^— 



C. Utilizing higher order averaging and singular perturba- 
tions with contraction in Extremum Seeking 

The typical ES scheme is as follows. Let's denote x the 
parameters to be optimized, ^ the internal dynamics built 
in ES (filters/estimation) and z the internal dynamics of the 
unknown system. The state equation is then: 



d 
dt 



z 




X 









hA{z.x) 
eB,.r{C{z),^,t)) 
eD,,,-(C(z),^,f)) 



(8) 



where B,D are nonlinear functions dependent on the par- 
ticular ES system. While ES usually depends on several 
small parameters (gains and cutoff frequencies), we assume 
that those parameters have been rescaled so that the system 
depends on a parameter of possibly small amplitude e and 
a set of parameters r = [r\... rp] of order unity. The slow 
system is 



d 
dt 



= e 



B,,(/2(x,),<^.v,f))' 

De,r{h{x,),i„t)) 



■m) 



(9) 



with X ~ [x;£,] and — [x/,£,s]. B and D known, as they are 
the design of ES and h — C(7(-))) is the static output of the 
fast dynamics black box system to be optimized. Only h is 
unknown. 

If the hypothesis of the previous sections hold, applying 
lemma 2 gives: 

\X-t\<'^ (10) 

This result is useful as it allows to give a bound on the 
difference between a system where a map if optimized 
(the limiting case for which ES is designed) from when a 
black box dynamic system is optimized, as a function of its 
bandwidth. 

Let's now assume that an ES scheme has been designed 
to operate on a map. With averaging theory, the change of 
variable U is constructed for system (|9]l algorithmically as 
explained in the previous part. Applied to system ([8]) with 
the use of equations (j7]| and ( 10 1 this change of variables 
gives: 

Fundamental Decomposition 

X = Y + 5U(Y,t) (11) 
Y = go{Y) + dg,iY)+R,XY,t)+RyiY,z,t) (12) 

Y — [xav,^av], where dU = (9(e) is the non identity part of 
the coordinate transformation, go gathers the dominant order, 
ideal terms (typically, the gradient or Newton descent), 5gs = 
Oeigo) represents the computed departure from the ideal 
descent that is autonomous and polynomial in e up to order 
n, Rg^ — (9(e"+^) is the higher order error of the slow system 
and Rv is the error induced by the black box dynamics. 
In particular, if bounds are known on / and its derivatives 
(gathered in the notation J/] = (||/|| , ||/'|| . . . )), each of those 
higher terms can be bounded: \5U\ < A'i(e,r, |/]), \5gs\ < 
K2ie,r,lfl)^ \Rs\<K3ie,r,lfl), \Ry\ = \dyU-'iX -X,)\ < 
K4{v,E,r,lfj). 



Using this decomposition, a relatively simple optimization 
can be ran to select V, e and the order unity parameters 
to maximize the search speed while keeping the error terms 
below some bounds. Particular possible meta-optimizations 
and useful relaxations are presented in the next section. 

III. Applications 

A. 1 state, 1-D system 

To illustrate this technique, we apply the procedure to the 
most simple 1-state ES scheme from [6] and work it through 
step by step. For ease of computation, we assume the dither 
signal to be sinusoidal, although the same analysis can be 
carried out with any periodic signal. The state equation for 
this ES scheme is: 

x= —r\h{x + asmt)smt (13) 

Qualitatively, if r] is small, the right hand side is small so that 
X is varying slowly and its long term evolution is given by 
the short time average of the right hand side, which reduces 
to x= —arih'{x)/2 if a is also small. The system therefore 
mimics a gradient descent scheme of h at rate at]/!. Here, 
Tj and a are parameters set up by the practitioner. In the 
cited hterature, they were assumed to be small parameters, 
enough so that the approximations that make x approximately 
driven by Vh hold. It is important however to note that the 
descent speed it driven by arj, and that those parameters 
should therefore be as large as possible so that the search 
performs prompdy. It appears that the amplitude of a and 
rj constitute a trade-off between speed and accuracy. Our 
interest in to use averaging theory to quantify the departure 
from the ideal gradient descent that the system mimics when 
the parameters are small and compute what finite gains are 
acceptable to make the search perform at a set precision, 
while maximizing the search speed. 

Before moving further, we take advantage of this example 
to discuss the parametrization of ES. Averaging theory, 
as presented in the previous section and in the reference 
literature, is parametrized with one small parameter only. 
In the case of ES, there are several small parameters so 
that the averaging analysis cannot be performed as is. One 
way to bypass this theoretical limitation is to parametrize 
each parameter as a function of the averaging parameter e: 
TJ =ri{e) and a = a{e). We require a and tj to be of class 
J(foo. The most natural choice is to use power laws, a ~ s" 
and ij = pe*", m,n gN*. This way, we separate in tj the 
"magnitude" part e"*, and its fine tuning value p. Then, to 
the reparametrized system 

x= —ri{e)h{x + a{e)sint)smt 

corresponds an averaged system in coordinate y defined by 
x = U{y) such that 

n 

!=1 

where the gi are sums and products of h, tj and a up the 
derivative. 



If m = n = l, the averaged system to the dominant order 

is 

y = -^h'{y)~^{an'^^h' + na'h^'^)+R, 
x = y + rism{t)h{y)+Ru 

where Rg = C*(e'*) and /?„ — 0{e^) are higher order terms 
in a, TJ that can be computed explicitly. The term '^h' is 
the ideal gradient descent that the scheme is intended to 
reproduce. The middle term in the equation for y was noted 
5gs in the previous section. For a given function h, the speed 
or search is proportional to aTj. Optimizing for the gains 
implies therefore some sort of maximization of aTj. 

The dynamics for y is in form of the general dynamic 
nonlinear system with noise dgs +Rg considered in the con- 
traction theory lemma. Assuming that h is Jc-robust, lemma 1 
can be applied to show that y follows the ideal trajectory 
z = ^h'{z) with an error that is at most di = 2 ^^4^^. 
Similarly, < ^2 = r]\\h'\\ + \\Ru\\- Assuming that an 

estimate on Ihj (and therefore on /?,) in known, the gains can 
be chosen to maximize the search speed while keeping 8i 
and ^2 below some tolerance error. In practice, it is possible 
to relax the constraints slightly by only considering the 
dominant error terms. There iire several possible strategies 
and some are listed below: 

1) The guaranteed meta-optimization consists in bound- 
ing the distance from the real system to the ideal 
system. A simple triangular inequaUty shows that \x — 
z\< 81 + 82 so that the problem reduces to: 

max Tjfl 

r),a>0 

s.t. £5,<A 

This is the safest optimization, as it guarantees con- 
vergence and the errors bounds (x will converge to 
a A-neighborhood of x*). However, the higher order 
remainders are often complex so that, even if they 
can be bounded provided that bounds are known, 
the bound is unlikely to be tight, leading to overly 
conservative regimes. Also, the error due to the average 
system and the error due to the oscillation of the real 
system may not be of the same importance for the 
user, so that it might be fruitful to separate them in 
the constraints. Those two drawbacks motivate the next 
two strategies 

2) Splitting up the error from the average system (the 
DC error) 81 and the oscillatory error ^2, the meta- 
optimization becomes: 

max Tja 

tj,a>0 

S.t. 8i < Ai 

3) Neglecting the highest order of the remainder,greatly 
simplifies the expressions for the bounds. For instance, 
in the case where r]= pe and a = e, the 1-D problem 



reduces to: 



max rja 

77,fl>0 



Performance and Speed map 



s.t. <; ^{ria^Wh^ 

arjK [16 

s.t. 77||/!|1 <A2 



2,Jl\ 



-ari'W^ifli 



In that case, since a truncation is made, a more aggres- 
sive search is recommended by the meta-optimization. 
Formal guarantee of convergence is lost though so that 
it is wise to check, after such an meta-optimization, 
that the neglected terms are indeed negligible. 
4) The previous strategies are well suited when the ob- 
jective is to locate the optimal point (such as when 
performing source/pollutant tracking for instance). In 
other situations it might be important to keep the real 
system x + flsinf close tox* at all times. The extension 
of strategy 1) is 



max rja 

riM>0 



S.t. a ^ 



m,\\ 

arjK 



\\RJ<A 



Remarks: 

• In the case a = e and f] = p£, the terms in "qa^ and arf' 
in the constraint for y have the same order However, 
if r] — /?£'", a = e" and m ^ n, the two terms won't 
necessarily have the same order. Therefore, for more 
general m,n, the symmetry may be broken leading to a 
problem with monomial constraints 

max ria 

7),a>0 

S.t. Tj'^'a'" <Ki 
s.t. rjP^a''^ < K2 

It has a finite solution if pi/qi < 1 < pi/qi which is 

I'l -P2 -11 12 

^ ^ j^ni'i-iin j^i2i',-iu'2 and ^ ^ j^^2>'i-^u'2 j^n,n-'nP2 ^ 

• Assume this monomial form for a and Tj and m > n. 
then, in the first constraint from case 3), the first term 
of the first constraint is dominant. Assuming that the 
expansion remains the same as in the m = n = I and 
that both constraints are active brings Tj = A2/||/ij| and 

V 

• Two consistency checks should be performed to ensure 
meaningful results: 

- are the higher order terms indeed small? 

- is p close to unity? Otherwise, m,n chosen to 
perform the expansion may not be the appropriate 
ones. 

• Obviously, if some rough estimates on J/i] are known, 
they can be used to compute tighter bounds on 5g and 



B. One state case, fully worked example 
Let's consider the optimization of h{x) 



starting from x - 



-cos(x) +x^/6, 
1. It is chosen as a toy example for all of 




Fig. 1. Performance map of ES as a function of a and p (log-log scale). 
The full lines indicate the speed of search, the dashed lines represent the 
the error |.v — x*]. The red dot is the result from the meta optimization. 



its derivatives are bounded by 1. The cubic term is added 
to break the third order symmetry at the minimum x* — 0. 
Of course, is not a global minimum, but for a, r\ small 
enough, it is still the attractor. 

We solve here the meta-optimization problem 3). The 
constraints require us to have estimates for the bounds on 
h, h', h", /iP) and K. We take |l/!(')||=^max^g[_i,i] |/i(')(jc)| 
and K = h"{0). We also set A, = 0.01 and start with r\ — pe, 
a = £. It gives the numerical solution rj = 0.01, a = 0.207 
Therefore p « 0.05 ^ 1 . To bring p closer to unity, we repeat 

pe^, 
This 



the averaging (it has to be done up to order 7) with tj = 

„2||/,(3)|| 

fl = e. The constraint for y simplifies to — 'L " < Ai 



time, as stated in the remarks, the problem can be solved 
analytically: 



SAifC 



which brings with the numeric values discussed before tj = 
0.01 and a — 0.209 (and p = l .09). The reason why the result 
hasn't changed much is because the term in oTj^ found when 
m = n = 1 in the first constraint is small already. The result 
is illustrated in figure [T] The operation point suggested by 
the procedure is close to but different from the optimal point 
for this very map, which is expected. 

C. Optimization of a dynamic system 

In this part we show how the singular perturbation theory 
allows to set the frequency of the dither For simplicity, we 
will consider again the 1-state ES, extended with a dynamic 
system: 

(14) 



d 


z 




-z + x 


dt 


X 




kh{z) sin (Ot 



In the present article we illustrate the technique with a 
first order filter for ease of computation but insist that the 



technique applies to any nonlinear, contracting system. To 



put system (14i in form ([8]l, we change the timescale z — cot 
and introduce y = x + a sin T and the reduced parameter 
Tj = k/co: 



d 


z 











(15) 



(0, 



77/1(2) sinT + flcosT 

Lemma 2 can be applied with d ^ \\h\\rj +a, Xy = 1 , V 
a = \\h'\\, to get 

\dzy-d.,y,\ < Tja)||/z'||(77||/z|| +a) 

which can be transformed back into the x,t variables: 

< 77a)2||/!'||(77||/!|| +a) 

We form the meta-optimization problem by relaxing \\h'\\ 
into |/z'(ji;)|. This gives an approximation for the worst case 
speed at the dominant order 

'arjco 



\8o\ 



-Tja)2(Tj|l/z|l+fl) 1/2' 



which defines the objective. Keeping the constraints on 5, 
from the previous part unchanged, the meta-optimization 
problem can be written as: 
'arjco 



max 

7),«,(U>0 



■ -rico^{ri\h\+a) 



s.t. — fl2u(3)|| <Ai 
8k- 

s.t. T7||/l|| < A2 
which gives CO — 0.48 and keeps a and rj unchanged. 

D. 1-D map optimization with filtering 

Consider the map optimization with first order low and 
high pass filters adapted from [5]: 

h = jj. + au) — /i) 



1) m| 



(16) 



with the dither m sinf where x is the parameter to be 
optimized, li is the estimate of h{x) and j is the estimate 
of h'{x) in the ES algorithm. Assuming that the parameters 
a, rj, jJ., 7 are all of order e, the average system becomes 

^2 



Jav — 



ay ~ 
ay 

y 



Ai<!/i„v + ^/i"|+(9(e^) 
^^h!'jar-'^~hh' 
AiV + 77Ai;/i" + ^M3)| + 0(e5) 



(17) 



Xav^-niav + 0{e'^) 

X — Xav + T77/!sinf + (9(e'*) 

where we have introduced h~h — h{xav) and j ~ j — h'{xav). 

Even more so that in the previous examples, there are 
several ways to optimize for the ES scheme. One is to 
maximize r\ while keeping the dominant orders of x — Xav 



and j/k below some fixed value Ai and A2. This is done 
by applying Lemma 1 do the dynamics of h and j. Taking 
the same h as in the previous part and estimating the 
bounds on \h\. . . j/z^^'j with their true maximum on [—1,1], 
we get (fl, Tj,^, 7) = (0.33,8.8 •10"^0.093, 3.8). Numerical 
experiments show that the system converges into the set error 
bound in about 100 oscillations. 

E. ES in two dimensions as an illustration of channels 
interaction 

Practical cases of problems of optimization are multi- 
dimensional. The added complexity in multi-dimensional 
problems arises from two phenomena: 

• the complexity related to "higher dimension op- 
timization", independently from the dynamic sys- 
tem/extremum seeking nature of the problem 

• the coupling between directions in the estimation of the 
derivatives of h 

In this section, we show that higher order averaging brings 
an elegant criterion for choosing the dithers that limit the 
coupling between channels. The result provided here is a 2- 
particle coupling similar to the 3-particle coupling from [10], 
[11]. For simplicity, we limit ourselves to the bidimensional 
case where h(xi,X2) is to be minimized: 

xi = ridih{xi -\-ad\,X2 + 0^2) 
X2 = r\d2h(xi + adi,X2 + at/2) 



We start with a qualitative presentation. In order to estimate 
the derivative of h with respect to each of the variables, the 
correlation between the dither signal di and the output h is 
measured. For instance he average equation for xi is: 



Xl 



uiri{didA^h + did2d^^^h} 



This qualitative analysis suggests that as long as did2 
the dynamics should reduce to the desired one: 

(xi) Kiari{df)djc,h 



0, 



This advocates for the possibility to use both sines and 
cosines for the dither, which is a priori good as it would 
allow a bandwidth twice smaller for a given number of 
channels. However, the averaged equations for di = cost and 
d2 ~ sinf are: 

,2 



yi = ^ovi"- 



yi - 



0{e') 



Although the term in arj is the expected gradient, the term in 
T]^ is aprecessing term, which, if dominant, keeps the system 
moving on level sets instead of following the gradient. The 
existence of this term advocates against the use of both sines 
and cosines in the ES (even if the system to be optimized has 
no dynamics), as it produces an undesired coupling between 
the channels at a dominant order. 



IV. Conclusion [20] J. Murdock, Normal Forms and Unfoldings for Local Dynamical 

systems. Springer, 2003. 

In this paper, we have shown how contraction theory 
applied to singular perturbation and modem averaging theory 
can help bring qualitative and quantitative insights into 
Extremum Seeking. In particular we have shown methods for 
selecting the finite gains of ES schemes optimally. Although 
most techniques were presented for the one dimensional 
objective functions, they extend to n-dimensional problems. 
Further work includes extending the study to formal n and to 
stochastic ES. The authors also believe that it constitutes a 
fruitful framework for designing new efficient ES schemes, 
particularly with adaptive gains. Lastly, it constitutes a ba- 
sis for comparing optimally tuned ES schemes with other 
optimization techniques. 
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